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Abstract 



We consider the counting rate estimation of an unknown radioactive source, which 
emits photons at times modeled by an homogeneous Poisson process. A spectrometer 
converts the energy of incoming photons into electrical pulses, whose number provides a 
rough estimate of the intensity of the Poisson process. When the activity of the source 
is high, a physical phenomenon known as pileup effect distorts the measurements and 
introduces a significant bias to the standard estimators of the source activities. We 
show in this paper that the problem of counting rate estimation can be interpreted as a 
sparse regression problem. We suggest a post-processed version of the Least Absolute 
Shrinkage and Selection Operator (LASSO) to estimate the photon arrival times, and 
derive necessary conditions which guarantee that the arrival times will be selected. The 
performances of the proposed approach are studied on simulations and real datasets. 

1 Introduction 

Rate estimation of a point process is an important problem in nuclear spectroscopy. An 
unknown radioactive source emits photons at random times, which are modeled by an ho- 
mogeneous Poisson process. Each photon which interacts with a semiconductor detector 
produces electron-hole pairs, whose migration generates an electrical pulse of finite dura- 
tion. We can therefore estimate the activity of the source by counting the number of activity 
periods of the detector. We refer the reader to [11] and [12] for further insights on the physi- 
cal aspects in this framework. However, when the source is highly radioactive, the durations 
of the electrical pulses may be longer than their interarrival times, thus the pulses can over- 
lap. In gamma spectrometry, this phenomenon is referred to as pileup. Such a distortion 
induces an underestimation of the activity, which become more severe as the counting rate 
increases. This issue is illustrated in Figure [T] 

In its mathematical form, the current intensity as a function of time can be modeled as 
a general shot-noise process 
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Figure 1: Example of a spectrometric signal. The red part is an example of piled up electrical 
pulses. 

where {Ek, A; > 1} and {$^(5), k > 1} are respectively the energy and the shape of the 
electrical pulse associated to the k-th photon. By analogy with queuing models, we call 
{W{t), t > 0} the workload process. The pulse shapes {$fc(s), A; > 1} are assumed to belong 
to a parametric family of functions Fe, G C M". The restriction of the workload process to 
a maximal segment where it is strictly positive is referred to as a busy period, and where it 
is as idle period. In practice, we observe of sampled version of ([T]) with additional noise, 
and wish to estimate from this recorded digital signal the counting rate activity. 

The problem of activity estimation has been extensively studied in the field of nuclear 
instrumentation since the 1960's (see [7j or (TH] for a detailed review of these early contri- 
butions; classical pileup rejection techniques are detailed in yj). Early papers on pileup 
correction focus specifically on activity correction methods, such as the VPG (Virtual Pulse 
Generator) method described in [211 [25] . Moreover, it must be stressed that these techniques 
are strongly related to the instrumentation used for the experiments. Recent offline methods 
are based on direct inversion techniques [20] or computationally intensive methods [1], and 
are usually not fitted for very high counting rates. It is of interest to consider fast, event-by- 
event pile-up correctors for real-time applications, as proposed in [22] for calorimetry and 
in [T7] for scintillators. One of the main advantages of the methods developed in [20] is that 
they do not rely on any shape information of the time signal, but rather on the alternance of 
the idle states (where no photon impinges to the detector) and busy states (when we observe 
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electrical pulses) of the detector. However, when the activity of the radioactive source is too 
high, we observe very few transitions from busy to idle states, thus making this information 
statistically irrelevant. 

In the latter case, it is therefore necessary to introduce additional assumptions on the 
pulse shapes (e.g. to specify Fe), and to estimate both the workload sample path on a 
relevant basis. This can be formally viewed as a regression problem. However, due to the 
nature of the physical phenomenon, and since Poisson processes usually represent occurrences 
of rare events, the regressor chosen to estimate the workload must be sparse as well. Since 
the seminal papers [I9] and [8] , representation of sparse signals has received a considerable 
attention, and significant advances have been made both from the theoretical and applied 
point of view. Several recent contributions |10] suggest efficent algorithms yielding estimators 
with good statistical properties, thus making sparse regression estimators a possible option 
for real-time processing. Indeed, LASSO provides a sparse solution close to the real signal 
for the ^2-norm. However, since we are not interested in the reconstruction of the signal for 
activity estimation, but rather in the Poissonian arrivl times, it is of interest to investigate the 
consistency in selection of the sparsity pattern. Numerous recent works have been devoted to 
this general question about LASSO, the first ones being [2H] and [H]. Both papers introduced 
independently the so-called irrepresentability condition as a necessary condition for selection 
consistency. More recently, p3j developed the conditions under which the irrepresentability 
condition is also a sufficient one. We also refer to [27], [15], [21] and for recent results on 
consistency in the ^2-sense for the signal estimation; note however that the estimation of the 
activity of the source is related to the selection consistency issue, whereas the consistency in 
the £2 sense should be used for energy spectrum reconstruction. 

The paper is organized as follows. Section |2] presents the model and the derivation of 
the estimator of the counting rate. This estimation can be roughly seen as a post-processed 
version of the LASSO. This sparse regressor is briefly recalled, as well as its main properties 
regarding consistency in selection. We present in Section [3] theoretical results showing that 
the activity of the source can be recovered almost as well as the best estimator we could build 
from a full knowledge of the Poisson process and discrete observations with a high probability. 
Section |4] illustrates on some applications the effectiveness of the proposed approach, both 
on simulations and real data. Details of the calculations and proofs of the presented results 
are detailed in the appendix. 



2 Sparse regression based method for activity estima- 
tion 

2.1 Model and assumptions 

Assume that we observe a signal uniformly sampled on T = {0 = tQ,ti,t2, ■ ■ ■ ,tAr-i = T}. 
We further on denote the subdivision step T/N by 6t, and by y = [yo,yi, ■ ■ ■ ,yN--i] the 
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resulting observations. The observations stem from a discrete version of the generahzed 
shot-noise process [18] : 

M 

yi = J2En<^n{t^-Tn)+e^, 0<t<N-l, (2) 

where {T„ , 1 < n < M} is the sample path of an homogeneous Poisson process with constant 
intensity A, {En, 1 < n < M} is a sequence of independent and identically distributed 
(iid) random variables representing the photon energies, with unknown probability density 
function /,{$„, 1 < n < M} is a sequence of functions to be defined later which characterize 
the electric pulse shapes generated by the photons, and {et, 0<2<A^ — l}isa sequence of 
iid Gaussian random variables with zero mean and variance cr^ representing the additional 
noise of the input signal. We further denote by y = {y^, ■ ■ ■ ,y7v-i) the noise-free part of y, 
in other words = X^^i En^niti — Tn) for alH = 0, ■ ■ ■ ,N — 1. 

For most spectrometers, an electrical pulse created by a single photon has a characteristic 
shape, usually in most detectors a rapid growth created by the charge collection and an 
exponential decay as the charges migrate to the detector. We therefore make the following 
assumptions: 

Assumption 1 the functions $„ belong to a fixed positive span of truncated Gamma func- 
tions 

Teit) = ce-t'' ■ e^'^' 1(0 <t<T), (3) 

where < r < T , 6 = {9i,92) E and Cg is a normalizing constant so that X]il:0^ ^eiti)"^ = 
1 for all n. 

Assumption 2 The random variables {En-, 1 < < M} are bounded by positive and known 
constants: 

< ^min <En< ^max, for all U . (4) 



2.2 Overview of the estimation procedure 

In the following, the cardinality of a set A is denoted by |y4|. Our objective is to estimate 
A given y. It is well known that if {T„, 1 < n < M} are the points of an homogeneous 
Poisson process, the inter-arrival times are iid random variables with common exponential 
distribution with parameter A. Therefore, A can be consistently estimated by 

A.^^. (5) 

J-M 

However, y is a discrete-time signal, therefore ([s]) cannot be attained since we are restricted 
to use only times in T. Define T„ as the closest point to T„ in T: 

Tn = fn + dn, 0< <6t (6) 
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and M as the number of distinct values of T„, the best estimate of A is defined as 

V^I. (7) 

It is hkely that Aopt < Ac, since M < M; however, provided X6t is small, Ac and Aopt should 
remain close. The main idea of this paper is to plug in ([T]) estimates of M and Tm, as now 
explained. 

Let {6^'^'' = {6'^^\ O^"*)] /c = 1, 2, ■ ■ ■ ,p} be a discrete domain of gamma functions pa- 
rameters. We consider the span of all these gamma shapes F^cfe), sampled on 7. For any r 
in [0,T], we can define the following N x p matrix whose columns are the samplings of the 
previous gamma functions shifted by r: 



= [r0(k)(ti -r)] 



0<i<N-l,l<k<p 



When r = tj we denote A(^) by Aj, and shall refer to this matrix as the time block associated 
to the j-th point. We can now define a global dictionary A by concatenating the time blocks 
A,: 

A=[Ao Ai ■■■ A^_i] (9) 

Should the T„ belong to 7 and the to Tg^k), then the vector f3 such that y = A/3 
would be naturally sparse. In a realistic situation the sampled shapes ^n{ti — Tn) do not 
belong to the span of A, because T„ ^ T almost surely and is random. However, provided 
X6t remains sufficiently small, the projection of individual pulses on the span of A yields 
only few consecutive non-zero coefficients. If the signal is estimated as J2m=o l^m, the 
set J{P) = {0 < m < N — 1] (5^ ^ 0} still contains much fewer elements than A^. We would 
like to recover this sparse block pattern J(/3), so we make use of the LASSO estimator [19j: 



/3(r) = arg min 



1 
2iV 



rn=0 



N-1 



+ ^ } , (10) 



m=0 



where the tuning parameter r quantifies the tradeoff between sparsity and estimation pre- 
cision. LASSO provides a sparse estimator (3 = [(3q, ■ ■ ■ , f3j/_i]'^ such that the linear model 
y = A/3 approximates accurately the signal y. In practice, ( [Io| ) can be efficiently computed 
by the LARS algorithm [TU]. Note that the group-LASSO [26j also exploits the time blocks 
decomposition of /3 and provide a block-sparse regressor. However, in this paper, we cannot 
assume the groups to be fully known, due to the incompleteness of A, and present only the 
results achieved by the classical LASSO in our problem. 



Assuming the solution (10) is known, the estimation of A should be carefully done. It is 
tempting to estimate the arrival times with the set J(/3) and the total number of occurrences 
by M = I J(/3)|, then plug this data into ([T]). However J(/3) may contain consecutive active 
time blocks which do not all correspond to real arrival times. This is not surprising: since 
A is incomplete, J(/3) may itself be distinct from {T„, n > 0}. Another explanation is that 
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columns in close time blocks are too correlated to ensure block sparsity coherence of the 
LASSO [28]. In this paper we suggest an additional thresholding step to the estimation of 
A to circumvent this issue, that is 



1. solve (10) to obtain /3{r). 

2. set all the (3m such that ||/3m||i < rj to zero, where 77 is a user defined threshold; 

3. estimate recursively T„ = nnn{t > T„_i ; f3t-i = 0,f3t ^ 0}, and M = \{t E 7 ; f3t-i = 

4. Estimate the activity as 

A(r,r/)^^ (11) 

We refer to steps 2 and 3 as "post processing" steps. Both steps can be heuristically 
understood as follows: step 2 in introduced since time blocks containing negligible weights 
are probably selected to improve slightly the estimation, but are not related to pulses start; 
indeed in realistic situations all the pulses considered, including the real ones, start with 
similar sharp slopes, but decrease differently, which makes these "negligible" time blocks 
appear behind the pulse start. In step 3 we merge consecutive selected time blocks due to 
high correlations between blocks and incompleteness of the dictionary, as mentioned above. 



3 Theoretical results 

In order to guarantee some consistency in estimation as well as in selection, previous works 
imposed conditions on the dictionary A, for instance low correlations between columns [SI El 
[T3t [5] or positivity of minors of specific sizes [IHl El [27j. The estimation procedure described 
in this paper is close to [15], which suggest improvements of LASSO by hard-thresholding co- 
efficients, so that only representative variables are selected. In [28], [23], the irrepresentability 
condition is presented, and is proved to be necessary if we wish selection consistency. Roughly 
speaking if Ai and A2 denote respectively the active columns and their complement, one 
can define the two Gram matrices An = ^AjAi , A21 = ^^A^^Ai. The irrepresentability 
condition in its less general form stresses that ||742iA]"]^'^||^ < 1 — r/ for > 0, where the infi- 
nite (triple) norm ||-B||oo of an operator B is defined as ||-B||oo = max ||i?5;||oo ; moreover to 

ensure sign consistency the parameter r should be chosen between two bounds depending on 
the least singular value of An. We adapt in this section these conditions to our framework, 
and derive bounds of our proposed estimator in a general way. 



3.1 Characterization of block sparsity patterns 

Given y and A, we are interested in subsets of T which support vectors /3 approximating 
well y, whereas satisfying an assumption similar to For P C 7 any subset of subdivision 
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points, let Ap be the concatenation of time blocks for j G P. For any vector x G M^, 
X > shall mean that every component of x is nonnegative. Following ([s]), for any vector 
13 G M.^P we shall denote by /3 = /3f , . . . , f3jf_i]'^ its natural decomposition along time 
blocks. Recall that J{(3) designates the block sparsity pattern of (3, that is J{f3) = {0 < m < 

Definition 1 Let a and (3ram < /3max be positive numbers, and P G 7. We shall say that 

P ^ ^/3min,/3max,« ^ff' 

min lly - Ap /3|L < a, (12) 

where C denotes the set of all nonnegative vectors (3 = [f3j, f3'[, . . . , (3jf_i]'^ such that J{f3) = 
P and that (3min < WPjWi ^ Pmax for all < j < N — 1. As a limit case of the previous 



definition, we shall denote by the set of supports defined as in (12) with /3min = 0,/3n 

+00. 



Equivalently (12) expresses a condition on the solution of a constrained least squares problem 
associated to y, hence we can assume that the residual 6p = y — Ap /3and e are independent 
random vectorsfor (3 in C. Note that by definition the set i3^in,i3^^^,a could be empty.In 
addition of the previous definition, for any subset P, one can define an associated potential 
intensity estimate as: 

Ap = . (13) 

max P 

Note that setting P = {T„, > 0} in the latter yields Ap = Aopt as defined in Therefore, 
we shall provide in the next section numerical conditions involving P, /3min, /3max, a, the noise 
level a and A, so that the obtained estimator is close to Aopt with a high probability. 



3.2 Main results 



We introduce hereafter further conventions: the complementary subset of P is denoted by 
P; for P, Q two such subsets, we define the matrix Apq as 



A 



PQ 



1 

N 



AlAc 



(14) 



Given P C T, for any value of the parameter r > define (3p{r) 



'P,N-1 

as the LASSO minimizer (10) under the additional constraint J{(3p) C P. Let P G 

Ap /3p + 5p + e, where [3p is a (non negative) vector, 

-1 



2 



for some a > 0, we can write y 

and 5p is the residual vector y — Ap {3 p. We also define the function t(x) = x " e 
which classically provides an upper bound of the tail of the Gaussian distribution. The next 
proposition describes for which values of r the regressor /3p is also a global minimizer of the 
LASSO. 
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Proposition 1 Denote by fimin > the least eigenvalue of App. Suppose the matrix infinite 
norm \\App\\^ satisfies for some rj > 0: 



I^PpIIoo < 



(1-^) 



then the following condition holds on the operator infinite norm of A-ppApp: 

\\AppApp\\^ < 1 — ?7 

Moreover, if r is chosen such that 



r > max 



2 \\6p\\^_ 2V2a \og{N-\P\)p 



rjVN 



N 



(15) 



(16) 



(17) 



then f3p is a minimizer of the LASSO functional (10) with probability tending to 1 as N ^ oo. 

Proof: See Appendix [B| ■ 

In the later results, we assume for convenience that is an even number, and we define 



Qmi-)- r(iv/2) 



N/2-1 ^ 

EX 
Id 

k=0 



(1^ 



The asymptotic behvavior of (18) are detailed in Lemma [ij We also define S = jf-^J-^j 
the Gram matrix of any single time block (obviously independent of j in our setting detailed 
at 2.2), which has positive entries less or equal than 1. The following result will be the main 



technical tool for the two main theorems. 

Proposition 2 Suppose we have P G Tq, for a > 0, that isy = Apf3p+6p+e for f3p > and 
\\6p\\2 < ay/N. Let now r be a LASSO parameter value, and l3{r) be the LASSO regressor. 
Define 



> a + r} 



Pa+r = {k & P 

as the subset of the most representative blocks of (ip; Then 

max Pa+r — T < max J{/3{r)) 
and denoting by N^+r the number of connected components of |J {k — r, k + t) 

h G -Pa + r 



iV„+,<|J(/3(r))| 



(19) 



(20) 



(21) 



with probability greater than 



1-p E t 



S/3_ 



P,k 



— a — r 



kePa 



2a 



Qn/2 



(22) 
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Assume now that the LASSO solution /3(r) has only nonnegative coefficients. For all I E 7 
define Ci = 9 /3i{r) — r . Then we have 

oo 

max{/c, Cfc > 2a} — r < max P (23) 
and denoting by N2a+r the number of connected components of |J (/ — r, / + r) then: 

{l,Ci>2a} 

N2a+r < \P\ (24) 

V Na 

both with probability greater than 1 — \{k, Ck > 2a;}| ■ t ' 



2a 



Proof: See Appendix [C| ■ 

Proposition |2] is of practical interest. Roughly speaking, its first statement says that if P 
is a sparsity pattern which allows a good representation of y, then the maximal value of the 
LASSO sparsity pattern J(/3(r)) and its cardinahty are lower bounded by terms depending 
on the most representative elements of P only with a high probability. Conversely, the 
second part of Proposition [2] states that the maximum and cardinality of the set of the 
most significant active blocks of (3{r) can be upper bounded by terms depending on P only. 
Therefore, the latter result relates /3p (which is most likely connected to the true sparsity 
pattern) to the solution of LASSO /3(r) after thresholding. 

Remark Note that the probability upper bounds presented in Proposition [2] may be 
negative if \fNa tends to 0. This illustrates the necessity of a more selective thresholding in 
that case, since in the latter we tend to accept all active blocks as arrival times since a tends 
to 0. In other words, one should threshold by numbers greater than the LASSO parameter 



to get reliable comparisons in (21) and (24). □ 

Remark Regarding the last result in Proposition |2} up to our knowledge no constraint 
ensuring the positivity of the LASSO regressor (5 appears in previous works. Nevertheless 
the results obtained on real data show it is a case worth considering: the true signal is not so 
far from being built out from A, and the energies are positive by nature. Actually, it would 
have been more convenient to consider the LASSO under the additional constraint that all 
blocks contain nonnegative coefficients only, for this is a crucial point in many proofs. Note 
that the optimization problem remains convex, thus this issue can be addressed, at least 
theoretically. □ 

Proposition [2] is used to prove next theorems, where we compare an "oracle" intensity \p 

for a threshold > 0. We recall that th 
> ?7 to obtain selected times Ti, ■ ■ ■ , Tj^ 



to the proposed intensity estimator (11) computed for a threshold rj > Q. We recall that the 
blocks of (5{r) are selected by thresholding Piir) 
as the minima of each open component formed by consecutive active blocks. We eventually 
compute X{r,ri) = M/Tj^. 
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Theorem 1 Assume that all the notations and assumptions of Proposition\^hold, and that 
P G ^j^.^.a for < r] < fi (the value /i is not important in this result). Define a = min S{hj), 

and let < r < slt] — a be a LASSO parameter, provided the right member is indeed positive; 
- M 

denote by A(r, 2//) = ^— be the estimator computed by (|11|). Then one has the following 



T 



M 



inequality controlling possible overestimation of Xp: 
A(r, 2r]) - Xp 



< 



M 


max J{/3{r)) 


maxj/c, 




>2ri}-T 

1 


N2a+r 




max{A;, 


K 





Xp; (25) 



and this holds with probability greater than 



^ ^ ( y/N [sLri — a — r] 



2a 



-QN/2{^—^]-\Jmi 



a" 



2a 



Proof: See appendix [D| 



Note that in (25) the only unknown term besides Xp is A^2a+r, which depends on the 



unknown error bound a; nevertheless the quotient M /N2a+r appearing in (25) justifies this 
third step of the proposed algorithm. Indeed, this is the best strategy to expect M / N2a+r 
to be as close to 1 as possible, thus minimizing the risk to overestimate the intensity. 



The last result relies on a "block mutual coherence" bound introduced in (26). It says 
roughly that if the correlations between active blocks are appropriately bounded, the under- 
estimation of the activity is also controlled by a lower bound. 

Theorem 2 Assume that conditions of Theorem^hold, and let C be the maximal correlation 
between two columns belonging to two distinct time blocks in P, that is 



C = max I maxAkmihj' 
{k, m)eP^ ^ 

k ^ m 



assume that the following condition holds: 

ari-a-2C\P\fi- 2^2 C (a^ + a^) > 0, 
and that r has been chosen so that 



2C 



+ r <ar7 — a — 2C|P|/i. 



(26) 



(27) 
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Then the following inequality controlling possible underestimation of Xp does hold under 
probability bounded from below as in Theorem^- 



Proof: See appendix [Dj I 
Note that the quantity 



M 



\Jim)\ 



M 
\Jikr))\ 



T 



M , 



< A(r, 2ri) - Xp 



(28) 



1 



T, 



is less than 1, thus the LHS of (28) controls 



indeed some underestimation. The two latter theorems are written as a function of a general 
subset P, though in practice the main case of interest is when P is the support generated 
by the T„, thus providing Aopt- Not surprisingly, the obtained results depend on unknowns 
quantities such as \P\; indeed the inequalities depend heavily on the ratio A^t, which controls 
both the sparsity of the optimal P and the accuracy of the dictionary. 



One could interpret 



M 



in (28) as some average number of active consecutive blocks 



(which stay so after thresholding by 2//) per estimated arrival time; this emphasizes that the 
sparsity parameter r should be chosen large enough in order to get a significant upper bound 
on the activity. More interestingly, (26) shares strong similarity with traditional bounds 



on the so called "mutual coherence" of the dictionary enforcing selection consistency p]; 
indeed the term "block mutual coherence" condition is adequate here, and it is satisfying 
for our purpose since we do not aim for precise selection consistency. Since C depends 



on the minimal distance between points in P, (26) is satisfied as soon as this distance is 
greater than some fixed positive number which depends on the shapes in the dictionary (i.e. 
their parameters) and the sampling width 6t. The dictionary is set up from the beginning, 
independently from the observed Poissonian sample, therefore this last question is equivalent 
to the one solved in Lemma |2] in appendix [A} 



4 Applications 

We present in this section results on realistic simulations, which emphasize the effectiveness 
of the proposed approach when compared to a standard method (comparison to a fixed 
threshold and estimation of A by means of the idle times of the detector, see [20]). The 
proposed algorithm for counting rate estimation is then studied on a real dataset. 

4.1 Results on simulations 
4.1.1 Experimental settings 

The performances of the proposed approach are investigated for 50 points of an homogeneous 
Poisson process whose intensity A varies from 0.05 to 0.4, which corresponds to physical 
activities from 5.10^ and up to 4.10^ photons per second when the signal is sampled to 10 



11 



MHz. Those numbers are related to high or very high radioactive activities, as mentioned 
for example in [1]. The energies {'jn,i^ > 0} are drawn accordingly to a Gaussian density 
truncated at 0, with mean 50 and variance 5. We present both results in the case of a good 
Signal Noise Ratio (cr = 1), as can be found in Gamma spectrometry applications. 

In this paper we chose a dictionary made of Gamma functions, in order to represent a 
charge collection increase and an exponential decrease for one single photon's pulse; more 
specifically, assuming that we observe points of the sample signal, the j-th column of 
the dictionary A is build accordingly to ([s]) and In order to check the robustness of 
the approach and its practical implementation for real-time instrumentation, the signals are 
simulated in two different settings: 

• for each point of the Poisson process, a shape is taken randomly from the dictionary 
A] this case will later on be denoted by (I). 

• for each point of the Poisson process, a shifted Gamma is created with randomly 
chosen parameters 61,62- In our experiments, both parameters are drawn uniformly 
accordingly to 61 ~ U{[0; 10]) and 62 ~ f/([0; 2]) (case denoted by (II)) 

It is obvious that the standard framework for regression is (I); however, as mentioned earlier, 
we also want to investigate how the algorithm behaves when the dictionary is not rich enough 
to cover all the possible shapes, and check the effectiveness of the post-processing step 
described in Proposition [2] and Theorem [TJ This allows to use the proposed approach on real- 
world experiments where fast algorithms and small dictionaries for real-time implementations 
are in order. For one given activity, the estimator is computed 1000 times by means of the 
proposed method, and by means of the standard method aforementioned, both in (I) and 
(II) cases. In all cases, the parameter was chosen equal to 3a. 



4.1.2 Simulation results and discussion 



Figure |2] represents a portion of the simulated signal in case (II) for A = 1, as well as 
the provided estimation and estimated time arrivals. We can observe that the obtained 
regressor fits well the incoming signal, and that a careful choice of allows to find most of 



the arrival times. The boxplots displayed in Figures 3(a) and 3(b) represent the distribution 



of the estimators of A (the actual value of A is displayed in the a;-axis) when using the 
standard method counting rate estimation, and the results obtained by our method are 



given in Figures 3(c) and 3(d) It can be seen from these results that the proposed algorithm 
provides an estimator with smaller variance, thus making it more appropriate for counting 
rate estimation. 

The high variance in the standard thresholding method can be easily explained. As A 
increases, so does the number of pileups, hence the number of individual pulses and arrival 
times are underestimated. Both phenomena yield a poor estimate of A. Regarding the 
estimator obtained through the Lasso reconstruction of the signal, the results obtained in 
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Figure 2: Simulated signal (blue) and LASSO regressor (red), with associated time arrivals. 



cases where A is high (e.g. greater than 0.15) are much better than those of the standard 
method: we observe a much smaller variance, and for the intensities 0.05 to 0.2 the obtained 
results are very close to the actual counting rate. 

When A becomes higher, several pulses are likely to start between two consecutive sam- 
pling points. Thus, the suggested algorithm may be misled in treating both as one single 
impulse, which explains why A is underestimated. However the data is obtained from a sam- 
pled signal, therefore the actual A cannot be well estimated when X6t is too high. Indeed, 
a better insight is obtained when comparing the values of our estimate with Xopt instead of 
A. It is is made in Figures 4(a) and 4(b)[ We observe an almost linear fit between both 
estimators, thus showing numerically that the proposed approach provides values similar to 
Xopt, which is the best estimate we could build from a full knowledge of T„ and of the sampled 
signal, but is in practice unattainable. 



4.2 Applications on real data 

We applied the proposed method for counting rate estimation on real spectrometric data 
from the ADONIS system described in , which is sampled to 10 MHz. The actual activity 
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Figure 3: Distribution of the obtained counting rate estimators. 



is 400000 photons per second, which corresponds to an intermediate activity. Figure [5] shows 
the use of the proposed algorithm on a real dataset. It can be observed from the latter figures 
that a very incomplete dictionary is more than sufficient to retrieve the starting points of 
each individual pulses. However, the post-processing step we suggest in this paper is required 
to estimate the activity of the radioactive source. The obtained estimated activity is 3.99e4, 
which conforms both to the simulations and the knowledge of the dataset. 



5 Conclusion 

We presented in this paper a method based on sparse representation of a sampled spectro- 
metric signal to estimate the activity of an unknown radioactive source. Based on a crude 
dictionary, we suggest a post-processed version of the LASSO to estimate the number of 



14 



(a) Proposed estimate of A versus Xopt - case (I) (b) Proposed estimate of A versus Aopt - case (II) 

Figure 4: Comparison of A with Aopt- 
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Figure 5: Results on real data: input discrete signal (blue), and active /inactive blocks (red). 
We observe several well-separated pileups. 



individual photonic pulses and their arrival times. Results on simulations and real data 
both emphasize the efficiency of the method, and the small size of the dictionary make the 
implement for real-time applications accessible. It was theoretically shown that although 



15 



the standard conditions are not met per se for the LASSO to estimate the input signal in 
a sparse manner, we can derive some conditions which guarantee that the number of indi- 
vidual pulses and arrival times are well estimated. This is made possible by the fact that 
we do not wish to reconstruct the input signal, but rather find some partial information. 
Further aspects should focus on the joint estimation of A and the energy distribution, as 
well as the estimation of the activity in a nonhomogeneous case, and should appear in future 
contributions. 



A Technical lemmas 

Lemma 1 For each positive integer k, let Qk{x) he the incomplete Gamma function defined 



by (18), and let Xk be any sequence of real numbers such that for some 7 > 1 we have > 7 /c 



for all k. Then for all m G M.' 

k'^Qkixk) = O ((1 _ ^ + log7)A;)) ^ 

Proof: Since Xk > k, we have for each <i < k x\/i\ < x^jkl. Thus 

Qk{xk) < e-^^ k = ^e-^'^x^^ (29) 

As k tends to infinity, Stirling's formula gives k \ ~ \/2'nk (^)'^, hence the right hand side 



(RHS) of (29) is equivalent to the left hand side (LHS) of the following inequality 
1 



Xk) 



< — ^ k^/^ exp {{k - 1) log7 + (1 - 7)^) 

v2 TT 



When multiplied by any power of A;, the RHS of the latter inequality tends to as A; goes to 
infinity, since 1 — 7 + log 7 < 0, which completes the proof. ■ 

Lemma 2 Suppose a homogeneous Poisson process of intensity A is observed in the interval 
[0,T], and let 6 > such that \^T5 < 1. The probability that all interarrival times are 
greater than 6 is bounded from below by 1 — \^T5. 

Proof: Let us compute the probability that one interarrival time is smaller than 5. 
Denote by T„ the ra-th point of the process sample path, and by Nt the number of points on 
[0,t]. It is known (see e.g. [2]) that 



/(m,-,T„)|AfT=n)(Ml, • • • ,Un) 

= fiTT TT TT \(lln1l.n ?/,„'l = 

T 
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where i — 1 . . .n} are the order statistics of n independent random variables uniformly 

distributed on [0,T]. When n < 1 whether the Poissonian instants are separated by a 
distance greater than 5 is an empty question; when n>2 and 2 < i < n we have: 

p(r, - r,_i < 5\Nt = n) = — Voi(oo 

where Jlj = {0 < Mi < ••• < Un < T ; Ui — Ui^i < 5}. For all 1 < A; < n we set 
incrfe — Uk — Uk-i (we set uq — just above), so it is equivalent to write 

= {incr^ > 0, 1 < i < n; 

n 

incrj < 6 and incr^ < T} 

k=l 

We have now the decomposition of fli along the (disjoint) slices defined by incr^ = t, < 
t < 5: 



[j a(i); 

0<t<5 

^i{t) = {incrj > 0, for all j ^ i, incrj = t; incr^ <T — t} 
Integrating now along the variable t we obtain: 

r.5 



Vo\{Qi) = f Vo[{ni{t)) dt 

5 n-1 

Jo (n-iy. ^ 
Hence we have P{{Ti - Ti_i < 5} | A^^ = n) = 1 - (l - |)" therefore we get for all n > 2 that 

P{Ti - Ti^i < 5 for some 2<i<n\Nt^n) 

<{n-l) 

We can of course consider that the same probabihty is equal to as n = 0, 1. Due to the 
equality '^^{n — l) — — {x — l) exp(a:;) + 1, we get by conditioning that the probability that 

n>2 ^' 

one interarrival time is smaller than S is not greater than 

\nrnn V / r \ ' 

exp(-AT)5^ — (n-1) 1-1-^ 

n>2 I \ J- J 

= AT - [A(T - 5) - 1] exp(-A5) - 1 
The lemma follows from Taylor inequality. ■ 
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B Proof of Proposition [T] 



All along the proof we will write /3p = Pp{r) for simplicity. Let z such that ||z||^ < 1, since 
the £00— norm is dominated by the £2— norm we get 



\AppApp ■ z||^ 



^ II^PpIIoo 



■ W^PP ' ^||2 



< 



By (|15|) we obtain (|16|). The rest of the proof follows 

ec 

Al {5p + e 



conditions for (3p imply therefore the existence of a vector zp G [— 1; 1]''^'^ such that: 

1 



App 



rzp 



N 



ApplL V\P\p 

with mild modifications. KKT 



(30) 



Still by KKT conditions /3p is a global minimum as soon as: 

1 



6p + e + Ap{(3p- (3p) 



r Z-, 



(31) 



with ||zp||oo < 1- Plugging in (31) the expression obtained at (30), and introducing the 
matrix 



1 



[Ap — Ap Ap^p App^ 



we obtain: 



1 



Hp {6p + e) + r A-ppAppZp = rzp 



By assumption (|16|) it is sufficient that the following condition holds: 

1 



H'p {6 + e] 



< rrj 



Note that Hp can be rewritten as Hp 



ip \ 4-1 



Ap_ 



T 



/N 



(32) 



Ap ] , showing that 



the columns of Hp are the projections of the normalized columns of Ap onto the orthogonal 
complement of the columns of Ap. It thus follows that all columns of Hp have normalized 
^2-norm bounded by 1 since this is true for A-p. Denoting by Hi any column of Hp, then 
H]" e is a Gaussian random variable of variance less than cr^. Consequently: 



1 



H]^e 



> 



rrj 



<5^P( \HTe\> 



2 

N rrj 



< 2t 



N rrj 



(iV-|P|)p 



(33) 
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In order to make (33) tend to 0, we need that 



r > 



2 72(7 /log(A^- |P|)p 



N 



Now we have also 



hTs 



N 



hence 



< ^ as soon as r > 



2 \\5p\\. 



¥p\\2 < " ' 



r]VN 



C Proof of Proposition [2 

For any A; G T and any positive number p, we first define Tp^k as the following subset of T: 



7p,k = 7\{tj e T, Akj{i,j) <p, l<t,j <p} 



(34) 



In other words, the set Tp ^ gathers sampling times tj "close" to tfc in the sense that some cor- 
relations between columns of the j-th time block and columns of the k-th time blockcontains 
columns are above the threshold p. Obviously for p > p we have f, C Tp^k- 

We keep the same notations as in Appendix Bj Let a > and P be the support of a 
nonnegative regressor (3p satisfying y — Ap /3p = ||5p|L < a^/N . Let (3 be the LASSO 

2 

regressor. For all k in P, using the same notations as in Proposition [T| we have by the KKT 
conditions: 



S /3p, fc + ^ Ak^m Pp, m - ^ Ak^l (3l 
k=/=meP I 



rzk-j^Al{5p + e), (35) 



for some vector G 
can write Xl; ^k,i A 
is nonnegative, it fol 



-1, 1]^. Let < 7 < 1; since all elements of A^^i are bounded by 1, we 
< (1-7) E/GT^,fe IIAIIi + 7 since the vector T^k^meP ^k,m^p,m 



ows immediately from (35) that: 



1 

iV' 



|S/3p,fc||oo - a- — ||A^e||oo - r 



<(l-7) Yl IIAIIi+7ll^l|i • (36) 



We can bound roughly the norm 
1 



since f3 minimizes the Lasso functional: 



1 



(37) 
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and as a result: 



2Nr 



\Sp + e\\l 



(38) 



(36) and (38) imply the weaker inequality 



|S/3p,fc||oo - a- ^||A^e||oo - r 



<(l-7) E IIAI|i+7(l|/3p||i + ^||<^p + e||^)- (39) 



7 ,fc 



The three terms in the RHS of (39) are bounded as follows. First expand the quadratic 
expression 

|5p + e||^<a^ + 2^ + l||e||?; (40) 



since e and 6p are independent, the variable Z 
than ^j^- On the other hand, ^ ||e||2 is distributed according to a x^(^) distribution, we 



T X 

7^ is centered with variance smaller 

N VN 



thus get when is even P(||e||2 > N{a'^ + a^)) = Qn/2 



2 



Eventually: 



< Qn/2 



+ 



Denote by = ||S /3p,fc||oo — for all k in Pa+r "we can write 



Cfc -r 



P — A^e > 



For all k in P^+r, the real number 7^ defined by 7^ 



S/3p,fc 


— a — r 

00 




/5p 


1 / 



is between and 



1/2, so (|39]) yields 

3dli>0| >l-pt 



iV (Cfc-r ) 

2(T 



— Qn/2 



'jf.,k 



A 



To conclude note that J^iej 
I e {k - T,k + t); this proves (121]) and ([20|) 



> implies the weaker condition that A 7^ for some 
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The second part of the proposition is proved as follows. Assume that /3 has nonnegative 
components. If > 2 a, using a similar argument as in the previous proof with the KKT 

a 

conditions expressed at the time block A;, we get with probability greater than 1 — t ' 
that for any 7 G [0, 1]: 

Cfc - a 



2a 



< 



1-7) E ll/3p,.lli + 7l|/3p||i (41) 



2 



Setting < Tfc = — ^ we have Il/5p,j||i > 0, i.e. 7^^^^ H P 7^ 0. Hence (24) and 



(23) follow with probability bounded as written. 



D Proofs of Theorem [T] and Theorem |2 



In order to prove Theorem [T| we combine the results of (20) and (24): whenever ar] — a > r 

(42) 



note that Pa+r = P (19), so one has 

maxP — T < max J {13) 
\/N [arj — a — r 



with probability at least 1 — |P| pt 



2a 



4a2 /jY^2^^2^ 
^-^^/My^^l-Now 



by Proposition [2j and the trivial inequality \{k, Ck > 2q;}| < | J(/3)| the following inequalities 
occur with the probability announced in the statement: 

N2a+r max{fc, Cfc > 2a] — 2r 



max J{f3{r)) max{A;, Ck > 2a} — r 



< 



2a+r 



max P — T 



< 



\P\ 



max J0) maxP maxP 



Ap (43) 



(25) is straightforward from (43), since any A; in T such that \\(3k\\i > 2r7 satisfies in 
particular > 2a, since the stronger r + 2a < 2a.r] does hold, thus Theorem [ij holds. 

The proof of Theorem [2] relies on elements of the proof of Proposition [2j In the latter, 
we found that whenever < 7 satisfies 



7 < mm 



9f3p,k 


— a — r 1 




(3p 


1 J 


\ 



then the number of connected components of the union |J X,, ^ is lesser or equal than 

fcGPa-|-r 

= P because in particular arj > a + r. 



\J{l3{r))\. If r satisfies (27), we get that P, 
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Moreover, when 



C < 



a 



2{\P\fi + ^) 



the previous number of connected components is simply equal to \P\. This gives precisely 



condition (27), which can be realized when (26) is satisfied: indeed the LHS of (27) achieves 
its minimum at r = a/2 a (cr^ + a^), which is then plugged back. These considerations 



together with (21), (23) imply the inequality: 



Ap < 



mr))\ 



max{fc, Ck > 2a} — r 



(44) 



Since Tr? < max{/c, > 2a}, the result follows from (11) and (44). There is therefore an 



interval of values of r for which (44) occurs under the estimated probability, which proves 
Theorem IH 
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